Loop Splitting for Superscalar Architectures

نویسندگان

  • Andrew Ilin
  • L. Ridgway
چکیده

Program transformations and algorithm modiications are discussed that reduce execution time for iterative methods for solving partial diierential equations on high-performance computers. Techniques typically associated with parallel computers turn out to be essential to obtain optimal performance on current superscalar uniprocessors. The tested programs were written in Fortran77 and run on a single processor KSR-1, SGI Indigo, Cray C90, HP-735, RS-6000, DEC 3000/600 AXP (Alpha) and Sparc workstations. A performance model is developed and used to assess the experimental data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Superscalar and Decoupled Access/Execute Architectures

This paper presents a comparison of superscalar and decoupled access/execute architectures. Both architectures attempt to exploit instruction-level parallelism by issuing multiple instructions per cycle, employing dynamic scheduling to maximize performance. Simulation results are presented for four different configurations, demonstrating that the architectural queues of the decoupled machines p...

متن کامل

Java Optimization for Superscalar and Vector Architectures

This paper describes the refactoring of Java code to take advantage of the superscalar and vector architectures available on many modern desktop computers. The unrolling of Java loops is shown to cause some speed-ups for Java code. However, our benchmarks reveal that Java still lags behind vectorized C code. The present state-of-the-art in computer hardware has outpaced the current state of the...

متن کامل

Portable Compilation of Vector Expressions for Architectures with Memory Hierarchy

The paper presents a scheme of code generation for vector expressions implemented in the CC] compiler (CC] is a vector ANSI C superset aimed at vector and superscalar architectures). The scheme is based on two well-known optimization techniques { loop invariant code motion and iteration space tiling. The problem of nding the optimal tile size for the imperfectly nested loop system implementing ...

متن کامل

Efficiency of microSIMD architectures and index-mapped data for media processors

We show that microSIMD architectures are more efficient for media processing than other parallel architectures like SIMD or MIMD parallel processor architectures, and VLIW or superscalar architectures. We define alternative mappings of data onto subwords, and show that the index mapping is an ideal mapping for achieving maximal subword parallelism with minimal revamping of the original serial l...

متن کامل

Software pipelining for Jetpipeline architecture

High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeline comes from the integration of superscalar, VLIW and vector architectures. Jetpipeline has multiple instruction pipelines, which execute multiple instructions like superscalar ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995